earth and space science
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Qin, Chuan, Chen, Xin, Wang, Chengrui, Wu, Pengmin, Chen, Xi, Cheng, Yihang, Zhao, Jingyi, Xiao, Meng, Dong, Xiangchao, Long, Qingqing, Pan, Boya, Wu, Han, Li, Chengzan, Zhou, Yuanchun, Xiong, Hui, Zhu, Hengshu
In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for both Earth and Life Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 20 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.
Ten Ways to Apply Machine Learning in Earth and Space Sciences
Machine learning is gaining popularity across scientific and technical fields, but it's often not clear to researchers, especially young scientists, how they can apply these methods in their work. In many ways, ESS present ideal use cases for ML applications because the problems being addressed--like climate change, weather forecasting, and natural hazards assessment--are globally important; the data are often freely available, voluminous, and of high quality; and computational resources required to develop ML models are steadily becoming more affordable. Free computational languages and ML code libraries are also now available (e.g., scikit-learn, PyTorch, and TensorFlow), contributing to making entry barriers lower than ever. Nevertheless, our experience has been that many young scientists and students interested in applying ML techniques to ESS data do not have a clear sense of how to do so. An ML algorithm can be thought of broadly as a mathematical function containing many free parameters (thousands or even millions) that takes inputs (features) and maps those features into one or more outputs (targets).
Second AI and Data Science Workshop for Earth and Space Sciences
NASA's mission of exploration requires leveraging new ways to utilize and learn from the unprecedented amount of data that space-based observation platforms generate. New capabilities are needed, ranging from onboard autonomy for robotic spacecraft to techniques for understanding the world and universe where we live. Artificial Intelligence (AI) and data science are rapidly becoming integral to NASA's future to drive automation and interpretation. AI is a collection of advanced technologies that allow machines to think and act, through sensing, comprehending, interacting, and learning. AI's foundations lie at the intersection of several traditional fields - Philosophy, Mathematics, Economics, Neuroscience, Psychology, and Computer Science.